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AUTOCOVARIANCE ESTIMATION IN REGRESSION WITH A 
DISCONTINUOUS SIGNAL AND m-DEPENDENT ERRORS: A 
DIFFERENCE-BASED APPROACH 

INDER TECUAPETLA-GOMEZfi) AND AXEL MUNK^^’^) 

Abstract. We discuss a class of difference-based estimators for the autocovariance in non- 
parametric regression when the signal is discontinuous (change-point regression), possibly 
highly fluctuating, and the errors form a stationary m-dependent process. These estimators 
circumvent the explicit pre-estimation of the unknown regression function, a task which is 
particularly challenging for such signals. We provide explicit expressions for their mean 
squared errors when the signal function is piecewise constant (segment regression) and the 
errors are Gaussian. Based on this we derive biased-optimized estimates which do not 
depend on the particular (unknown) autocovariance structure. Notably, for positively cor¬ 
related errors, that part of the variance of our estimators which depends on the signal is 
minimal as well. Further, we provide sufficient conditions for -^/n-consistency; this result is 
extended to piecewise Holder regression with non-Gaussian errors. 

We combine our biased-optimized autocovariance estimates with a projection-based ap¬ 
proach and derive covariance matrix estimates, a method which is of independent interest. 
Several simulation studies as well as an application to biophysical measurements complement 
this paper. 


1. Introduction 

In nonparametric regression with correlated errors, 

yi = fixi)+ei, i = l,...,n, (1.1) 

where {xi) are the sampling points, / is an unknown mean function or signal, and (e*) are 
zero mean stationary time series errors, the autocovariance ■jh = E[eiei+/i], h = 0,1,..., 
plays a prominent role for various tasks. When the signal / is smooth, the autocovariance 
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appears, e.g., in the asymptotic variance of kernel estimators of / and is important for 
bandwidth selection and for inferential procedures, cf. Opsomer et ah (2001). In general, 
knowledge of the autocovariance, and in particular of the variance = 70 , is required for 
efficient signal estimation, e.g. in wavelet-based estimation the autocovariance can be used 


for improved thresholding of the empirical wavelet coefficients, cf. Johnstone and Silverman 


(1997), Von Sachs and MacGibbon (2000), Kovac and Silverman (2000). Additionally, when 
the signal is discontinuous, which will be considered in this paper, the autocovariance func¬ 
tion is required to provide efficient estimates and conhdence regions for the location of the 
discontinuities as well as the magnitude of their corresponding jumps, see Section 7.2| for 
an example. Autocovariance estimation under a discontinuous and potentially highly fluc¬ 
tuating signal, however, is a notoriously difficult task in general and some knowledge about 
the dependence structure is necessary. Therefore, throughout this paper we will consider 
zero mean, stationary, m-dependent errors, i.e., 7 ^ = 0 for \h\ > m; 7 ^ 7 ^ 0. Although 
m-dependency ensures that the autocovariance function is zero starting at lag m -|- 1 , this 
characteristic of our error model can often be construed as a convenient proxy to more gen¬ 
eral situations, e.g. when the autocovariance function decays exponentially with increasing 
lag. 

Regression models with discontinuous signal and m-dependent errors as considered in this 
paper are of relevance in several areas of application. Figure displays time series from 
two common biophysical measurements: (A) recordings of an ion channel trace and (B) 
the trajectory of a molecular dynamic protein. For (A), the mean is typically modelled 
with a locally constant signal (also called change-point segment regression) according to the 
openings and closings of the channel, cf. VanDongen (1996), and m-dependence results from 
the low-pass hlter utilized to digitize ion channel measurements, cf. Hotz et ah (2013). For 
(B) a more flexible signal assumption {piecewise smooth change-point regression) seems in 
order and often m-dependence can be conhrmed empirically. 

The contributions of this paper are also relevant for change-point estimation and detec¬ 
tion which have been investigated extensively in the particular case of independent errors. 


see e.g. Page (1954, 1955), Diimbgen (1991), Brodsky and Darkhovsky (1993) and Carlstein 


et ah (1994) for some early references. More recently, and arguably motivated by large 


scale applications, e.g. from genetics, the focus has been on the recovery of signals with a 


potentially large number of change-points, see e.g. Olshen et ah (2004), Fearnhead and Liu 


(2007) 

Spokoiny ( 

2009), 

Harchaoui and Levy-Leduc 

(2010 

, Killick et al. 

( 2012 ) 

Siegmund 

(2013) 

Frick et al 

(2014 

), Du et al. 

(2016 

) and 

Li et al. 

2016 

) among many others. For 


serially correlated errors, change-points estimates have been primarily investigated asymp¬ 


totically and when the number of change-points is hnite (but unknown) see e.g. Davis et ah 


(2006), Fryzlewicz and Subba Rao (2014), PreuB et ah (2015) and Chakar et ah (2016). 


2 














































































































Figure 1. (A) 60 s of gramicidin A, see Section]^ for further details about this dataset. 
(B) 20000 ns of the trajectory of the distance between a backbone atom and the first center 
of mass of water channel AQYl, cf. Krivobokova et aT| (|2012 ) for further details. 


To some extent, such an asymptotic analysis resembles finite sample situations when the 
number of change-points is small (relative to the sample size) and variance-covariance esti¬ 
mation becomes less cumbersome ( [Picard (1985), Huskova et al. (2007)). Autocovariance 
estimation may even be disregarded for consistent change-point estimation over large classes 
of dependency structures (Lavielle and Moulines (2000), Bardet et al. (2012)). 

In contrast, in this paper we mainly adopt a non-asymptotic perspective motivated by 
highly fluctuating signals having a potentially large number of change-points as it is known 
that this will increase the finite sample bias of any standard variance-covariance estimate. 
Hence, subsequent use of these biased estimates may significantly reduce the efficiency of 
change-point statistics, cf. Eqs. (13)-(15) of Jandhyala et al. (2013). As we will show later 
on, even when the number of change-points increases with the sample size, it is still possible 
to account for bias-reducing and -y/n-consistent estimates for 7 (.). This is reflected by the 
fact that estimation of 7 (.) is simpler than that of the entire signal /, which is well-known 
for 7 o = in the independent case (e.g. Spokoiny (2002)). 

In summary, in those situations when the signal fluctuation becomes dominant (which 

will be made precise later on), pre-estimation of the signal / is notoriously difficult and 

direct estimation of 7 (.) becomes pertinent. This paradigm is well known, in particular, 

for independent noise, i.e. for estimation of the variance cx^. For this task, difference-based 

estimators provide a simple and practical solution: a difference sequence {Aj} is a sequence 

3 






































of real numbers such that 


Ai = 0, 


A? = 1. 


( 1 . 2 ) 


/ , 0 “7 / j 

Assume that Aj = 0 for i < —li and i > I 2 , and A_i^ 7 ^ 0 with /i, I 2 > 0. Then I = I 1 + I 2 


is called the order of the sequence; usually /i = 0 and I 2 = L Following Hall et al. (1990) a 
difference-based estimate of cr^ has the form 

n-h / \^ 

d-2 = (n-/)“^ I^AiT/i+fc) . (1.3) 

fc = /l + l 


When the signal is smooth, estimators of this type have been investigated extensively see 


e.g. Rice 

(|1984 

), 

Gasser et al. 

Spokoiny 

(2002 

), 

Brown et al.| 

Munk et al. (2005 

) and others. 


can be adapted to high fluctuation of the signal, i.e. for bias reduction. As we will see in this 
paper, this feature makes difference-based estimators particularly useful also in applications 
with correlated errors where the signal exhibits high fluctuation and discontinuities. 

Difference-based estimators have also been used in nonparametric regression with station¬ 
ary errors. For instance, Miiller and Stadtmiiller (1988) proposed estimators based on differ¬ 
ences of hrst order to estimate (invertible) linear transformations of the variance-covariance 
matrix of stationary m-dependent errors. Herrmann et al. (1992) suggested differences of 
second order to estimate the zero frequency of the spectral density of stationary processes 
with short-range dependence. For autoregressive errors. Hall and Van Keilegom (2003) pro¬ 
posed \/n-consistent and, under normality, efficient autocovariance estimates. Under some 


mixing conditions. Park et al. (2006) suggested to estimate the autocovariance function ap¬ 
plying difference-based estimators of hrst order to the residuals of a kernel-based ht of the 
signal. Most close to our work is Zhou et al. (2015), who provide an optimal difference- 
based estimate of the variance = 70 for smooth nonparametric regression when the errors 
are correlated. Their optimized weights, however, depend on the remaining values of the 
autocovariance function, i.e. 'fh, h ^ 0, which in general are unknown. In contrast, in this 
paper we estimate the entire autocovariance function and our estimates depend solely on 
m. All the methods just discussed have been analyzed for smooth signals and to the best of 
our knowledge, derivation of optimal weights for autocovariance difference-based estimates 
in the case of a discontinuous and highly fluctuating signal still remains elusive and becomes 
the main focus of our work. 

Summarizing this paper, we begin by suggesting the use of difference-based estimators of 
the autocovariance of the error process in nonparametric regression (1.1). Then we show 
that for m-dependent errors we need to use differences of gap m -|- 1. We obtain hnite- 
sample results for the mean squared error (MSE) of these estimators. This MSE includes 
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a bias term which depends on the unknown regression function /, in situations where / 
fluctuates substantially this term could dominate. We show that to minimize this bias term 
(and that part of the variance which depends on / as well) we should use estimators based 
on only hrst or second order differences and give explicit forms for the optimal choice of 
weights in the difference-based estimators. Further, we provide sufficient conditions for -in¬ 
consistency; this result is extended to piecewise Holder regression with non-Gaussian errors. 
A more detailed account of these and other results is presented in the next section. The 
theory is complemented by simulation studies, a data example, and statistical software for 
autocovariance estimation. 


2. Main results 


As a prototypical example of a regression model with a discontinuous signal we consider for 


the moment (1.1) with a signal / that is locally constant {change-point segment regression) 


and hence admits the representation 

K-l 

fix) = ajl[rj,rj+i){x), X e [0, 1), Oj ^ Oj+i. 


( 2 . 1 ) 


j=0 


Here the change-points of /, 0 = tq < ti < ■ ■ ■ < tk-i < tk = 1, its levels {aj)o<j<K-i, 
and the number of discontinuities iF G N are unknown and can be potentially large. Since 


for large enough K any discretized function can be represented as in (2.1), this equation 


describes a wide class of signals. For simplicity of presentation we will assume that the 
sampling points (x,) are equally spaced on [0,1), i.e., Xi = i/n. Further, fi will denote f{xi). 


2.1. Autocovariance estimation. Now we introduce the class of difference-based estima¬ 
tors to be considered for m-dependent errors. Let Y denote the vector of observations yi 


following (1.1). For 1 < /h < n, a generalized difference-based estimator of order I and gap 


h is a random quadratic form 


Qk(Y, wi) = 


n—lh 


Y idoVi + diVi+h + d2yi+2h ... + di Vi+lh) : (2.2) 


P{wi){n-lh) 

where wi = (do di ■ ■ ■ di)~^ G is a vector of numbers (weights) satisfying 

i 

^ di = 0, 

i=0 


(2.3) 


and P{wi) = Yl!i=od'l. Setting h = 1 in (2.2) we get difference-based estimators of order Z, 
cf. Eq.( 1.3| ). For I = 1 and lUi = (1 — 1)"'' G P{wi) = 2 and for 1 < h < n we get the 










ordinary difference-based estimator of gap h (for h = 1 see Rice (1984)): 

^ n—h 

1 ' \2 


5^^'^ = 


iVi Ui+h) 


(2.4) 


2 (n -h) 

Note that for independent errors in (1.1), £[5*^^^] = 70 + o(l) provided J2i=i(fi ~ fi+h)'^ = 
o{n) and we can use the estimator (2.4) to get asymptotically unbiased estimates of the 
variance. In contrast, for stationary errors the situation becomes more complicated as then 
£[?('')] = -yg - + 0(1). 

We will show for the change-point segment model (2. ^ (Theorem of Section |3.1| ) that 
for m-dependent errors any estimator of the variance 70 = cr^ whose MSE tends to 0 and 
that satisfies (2.2)-(2.3) must have gap h at least m -|- 1. Therefore, we will focus on this 
type of variance estimators in the following. We will further restrict ourselves to estimators 
of first or second order, see Lemma below for justification of this. We commence now a 
discussion about the MSE of this class of estimators. 

Let Tim = n — 2{m + 1) and w.l.o.g. set do = 1, then estimators for 70 based on differences 
of second order [I = 2 ) and gap m -|- 1 in ( 2 . 2 ) can be written as 

(1 -|- d -|- cP)~^ 


7”V) = 


2 Hr. 


{Vi ~ (1 + d)yij^(m+l) + d|/i+2(m-|-l)) , d G 


(2.5) 


2=1 


using (2.3). Combining the ordinary difference-based estimator of gap h, cf. (2.4), with 
(2.5) we will estimate the remaining values of the autocovariance function 7 / 1 , h = 1 ,..., m. 
Namely, 

lh^\d) = l^\d) — d e M, h=l,...,m. ( 2 . 6 ) 

Here d G M is a parameter to be optimized later on. In order to present our first result we 
need to introduce the quadratic variation of / in ( 2 . 1 ): 

K-l 

Jk ■= ^(oj+i " 

j=0 


a,f. 


(2.7) 


Theorem 1. Suppose that in the segment regression model (1.1)-(2.1) the noise {ei)i<i<n is 
a sample from a zero mean, m-dependent, stationary Gaussian process with autocovariance 
function 7/1 = E[eieiJ^h\, h = 0,...,m. Additionally, assume that the change-points of f, 
0 = To < Ti < ■ ■ ■ < tk -1 < tk = I, satisfy that 


min Ixj+i — rd > 4(m-|-l)/? 7 ,. 

l<i<K-l 


( 2 . 8 ) 


Then, for m > 1 

MSE[7S'”^(d)] = Po(d) (pi(d; 7(.)) 70 Jr + np2{d; 7(.)) + psid; 7 (.))), (2.9) 


BIAS2 


VAR 
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and for = 1 ,..., m 


MSE[^jf^\d)] = n ^ {p*oY{d) Jl + n ^ {pl{d]-f^.)) Jk + np*{d;-f^.)) + pl{d;-f^.))) . (2.10) 


BIAS^ 


VAR 


See Section 3.2 for explicit expressions of po, Pq, pi, pi, p 2 , pi, ps and pg. We remark that 
Gaussianity is only needed to get the variance explicitly. 

Remark 1. As mentioned above for simplicity we are considering equally spaced observa¬ 
tions sampled from the unit interval at rate 1/n. Note, however, that our results can be 


transferred to general sampling points as we provide the finite sample MSEs (2.9)-(2.10) and 


the only restriction is ( 2 . 8 ) which transfers to general sampling points by replacing r,- by 


[nTj\, where [xj denotes the integer part of x. 

Now let us discuss Theorem As an immediate consequence we deduce that the influence 


of / over the MSEs (2.9)-(2.10) only appears through its quadratic variation Jk (2.7). Let 
us hrst consider (2.9). Here the bias consists of n~^pQ{-), a rational function, times Jk- We 


can further dehne the extended bias, BIAS*, oijlf^\d) as the part of the MSE in (2.9) that 
depends on /: 

BIAS*[ 7 j”'^(ci)] :=n~^ [pl{d) + Piid^'y^.))'yo Jk] - ( 2 . 11 ) 

In order to justify our dehnition of extended bias, note that unlike pi, the rational functions 
P 2 (-; 7 (-)) and ps^-; 7 (.)) do not align with / and these functions affect the part of the variance 
which solely depends on the autocovariance. A similar analysis and conclusions can be made 
for the estimators in ( 2 . 10 ). 


In view of (2.9) and (2.10) we deduce that for highly oscillating and discontinuous signals. 


in ( 2 . 1 ), e.g. for large K, the influence of Jk over the MSEs may become dominant and 


might not be negligible even for very large sample sizes. On the contrary, the influence of 
the unknown autocovariance function 7 (.) over the MSEs is comparably small in this situation. 


In light of the above we focus on finding variance estimates in (2.5) and autocovariance 


estimates in (2.6) whose MSEs (see Theoremhave minimal influence from Jk (and hence 
from /). Our main results on autocovariance estimation for change-point regression with 
stationary m-dependent errors are stated now. 

Estimation of 70 . From Theorem[^we get that BIAS[ 75 ”^^(d)] = n“Vo(d) Jk, see (2.9). 


Since for m eN, mindgKPo(d) = Po(l) = ((’^ + l)/3)^, for details see (3.3), we have proven 
the following: 


Theorem 2. Let 7 o”*^(d) be the difference-based estimator given by (2.5). Suppose that in 


the segment regression model ( 1.1 )-( 2.1 )-( 2 . 8 ), the noise (ej)i<j<n is a sample from a zero 
mean, m-dependent, stationary process. Then for m > 1, BIAS [ 79 ”^^ (d)] is minimized at 


























d = 1, i.e., by the estimator 


7^(1) = 


QUr 


' '■"IIL 

^ ^ yi-\-{m-\-l) “1“ ?/i+ 2 (m+l)) 7 ^ 2(77Z “h !)■ 


( 2 . 12 ) 


2=1 


Note that in Theorem a normal error assumption is not needed. Already [Herrmann 


et ah (1992) have suggested this estimator in the context of nonparametric regression with 


a smooth signal and stationary errors. Note that Theorem provides further justihcation 
for its use particularly when the signal is discontinuous. In fact, assuming further that the 
correlation is non-negative, we found that even the extended bias of is minimized at 


d = 1. More precisely we get the following result whose proof can be found in Section 3.2 


Theorem 3. Suppose that the conditions of Theorem^^hold. Assume, additionally, that the 
autocovariance function of the noise (£i)i<i<n belongs to either of the following classes: 

1 . jh = PJo, for h = 1,... ,m such that 


I + 2 p ^ cos(h A) >0, VA e [—TT, tt] . 
2 . 7 /i > 0 for h = 1,... ,m. 


(2.13) 


h=l 


Then form > 1, the estimator in (2.12) minimizes the extended bias BIAS*[ 7 o’"'’(d)] 

cf dmi). 


(m), 


Eq. (2.13) ensures that the function ■jh = P7o for h = 1,...,m and zero otherwise, is 
the autocovariance of a zero mean, stationary, m-dependent process, cf. Corollary 4.3.2 of 


Brockwell and Davis (2006). 


Estimation of ■jhi h = 1,... ,m. We will show that the following weights optimize the 


bias of estimators given by (2.6): 


dh,m 


hiyf h'^—4{m+l—hP 
2(m+l—h) 


for h < ^{m + 1) 
otherwise 


(2.14) 


Theorem 4. Suppose that the conditions of Theorem~^hold. Then for m > 1, BIAS [ 7 ^™^ (d)] 
is minimized at dpm, cf. (2.14). In particular, for those values of h such that h > 1), 

is an unbiased estimate of'jh- 

Proof. Since BIAS[ 7 [”*^(d)] = pQ{d)JK/n, cf. Theoremj^ wherePo('^) = Po{d) — h/2, cf. (3.8), 
we only need to minimize Po(') w.r.t. d. For 3h > 2{m -|- 1), dh,m, cf. Eq. (2.14), is a root of 
Pq. For 3h < 2{m + 1), straightforward calculations yield that d = 1 is a global minimum of 
Pq on M. This completes the proof. □ 

Observe that the underlying autocovariance 7 (.) does not appear in the expression for 
h = 1,... ,m. In contrast, the corresponding extended bias of J^\d), d G M, 























depends on the nnknown 7 (.) in an intricate fashion, cf. Eq. (2.10)-(3.10)-(3.11)-(3.12), hence 
full minimization of this function is practically infeasible. 


In summary, we hnd that the difference-based estimates given by (2.12) and (2.14) are 


bias-reducing for the autocovariance of wide classes of stationary m-dependent processes 
and discontinuous signals. Additionally, under Gaussianity and for non-negative correlation, 
(2.12) is even extended-bias-optimal. Moreover, Theorem]^ of Sectionestablishes that for 
piecewise Holder continuous signal (which includes segment regression (|2.1[)) with general 


(non-Gaussian) errors whose all moments up to order 4 are stationary, the estimates (2.12)- 


(2.14) are \/n-consistent for 7 (.) as long as the number of discontinuities Kn = o{y/n). 


2.2. Covariance matrix estimation. Let L be the nxn symmetric Toeplitz matrix whose 
hrst m -|- 1 entries of its hrst row are hlled with h = 0,1,... ,m, cf. Eqs. (2.12)- 


(2.14), and the remaining n — {m+l) entries are zero. As in general, pointwise autocovariance 


estimates jh, h = 0,1,... ,m, may lead to a covariance matrix estimate which is not positive 


dehnite, cf. Hall and Van Keilegom (2003), E may not be the exception. In order to overcome 


(2.15) 


this problem we propose a projection-based estimator. Dehne 

r = P,(^)(f) := argmin{||f - G 

that is, the unique projection of T onto the closed convex set of all n x n symmetric, 
positive semidehnite, (m-l-l)-banded Toeplitz matrices. In (2.15), ||-Hi? denotes the Frobenius 


norm. We show that the projection-based estimate F* always has smaller mean squared error 
than the pointwise-hased estimate F, cf. Theorem]^ in Section]^ This theorem might be of 
interest on its own since it relies on a general projection principle which can be applied to any 
ensemble of individual autocovariance estimators to obtain a symmetric, positive semidehnite 


Toeplitz (covariance) matrix estimate. Our hnal autocovariance matrix estimate F* in (2.15) 
can be computed numerically by a Dykstra-type alternating projection algorithm which we 
will detail in Section [All 


2.3. Numerical studies. Finite sample properties of our bias-reducing autocovariance es¬ 
timators (2.12)-(2.14) are investigated in a series of simulations in Section]^ and compared 


to other estimators from the literature: Section 6.1 assesses the performance of these esti¬ 
mates for autocorrelation estimation, their robustness against normal distributed errors is 
also studied; robustness against the assumption of a piecewise constant signal is explored in 


Section 6.2 When the signal is discontinuous we found that our estimates outperform all 
the others under comparison. For smooth signals and positive correlation our estimates are 
competitive to some optimized kernel-based estimates. 
























2.4. Applications. In Section we analyze the dependence structure of the data exam¬ 
ple shown in Figure ^ We found a 6 -dependent process as appropriate and estimate its 
autocorrelation. We found that is in close agreement to that obtained by theoretical consid¬ 
erations from low-pass hltering. Although in this paper we only considered an application 
on biophysical measurements, we stress that our method may proven useful for many other 
areas such as the analysis of hnancial time series or sequential data in genetics. Finally, 


in Section |7.2| we show how our estimates can be used to improve considerably the esti¬ 
mation accuracy in the m-dependent case of a change-point estimator initially designed for 
independent data. 

2.5. Software and Supporting Information. The methods discussed in this paper are 
available in the R package dbacf (http://www.stochastik.math.uni-goettingen.de/dbacf). In 


this software we have implemented the estimates (2.12)-(2.14) (see function dbacf) as well 
as the alternating projection algorithm from Sectionleading to F* (see nearPDToeplitz). 
We defer the proofs of most of our results to the Supporting Information of this paper. 


3. Optimization of difference-based estimators of autocovariance function 

In what follows we will say that an estimate Wn is consistent if MSE[lWi] —)■ 0 as n —)■ oo. 

3.1. On the consistency of generalized difference-based estimators for variance of 
m-dependent processes. 

The next theorem highlights that in a change-point segment regression with zero mean, 
stationary, m-dependent errors, consistent estimation of the variance 70 based on difference 


schemes already restricts this class to estimators with gap h at least m -|- 1 , see ( 2 . 2 ). Some 
technical details are deferred to Appendix |A. 1 [ 


Theorem 5. In the segment regression model (|1.1 )-(2.1 )-(2.8) with zero mean, m-dependent 


stationary errors, any consistent difference-based estimator for the variance 70 given by ( 2 . 2 )- 


(2.3), has necessarily gap at least m-\- 1. More precisely, let g and m be fixed integers with 


g > m and assume that there exists an integer A > 1 such that n = ^(^f -|- 1). Then, in 


( 2 . 2 ), the vector of weights Wn must have the form: 

Wn = (vq Vi ■■■ VN_iy e M*", 

0 ■ ■ ■ 0)"'' G i = 0,..., N — 1; do 7 ^ 0, dk-g 7 ^ 0 for some 


where Vi = {di.g 
the integer g. 


1 < k < N — 1, and '^^=0 dj.g = 0. Here j ■ g denotes the multiplication of the index j by 


Idea of proof. Since a consistent estimate is necessarily asymptotically unbiased, our line 


of argument consists of showing that for any difference-based estimate satisfying (2.2)-(2.3) 
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to be an asymptotically unbiased estimate of 70 , it is necessary that its gap be at least m + 1 . 
The technical details of the proof can be found in Appendix |A.1[ □ 


A key part of the proof of Theorem consists of computing the bias of a generalized 
difference-based estimator of order I and gap 1. This is provided by Lemma below and in 
order to present it now we introduce some notation. Recall that /* denotes f{xi) and for 
i <n-l, fi-.(i+i) denotes the vector {fi fi+i ■ ■ ■ fi+i^ e M'+b Let 


D 


djQ di (^2 ''' 

0 do di 



(3.1) 


For X G ||a;|| denotes its Euclidean norm. 


Lemma 1. Set ni := n — (I + 2) > 0 and suppose that the conditions of Theorem^ hold. 
Let Qi(Y, m;) he the difference-based estimator of order I and gap 1 given in ( |2.2[ ). Then 
for I < 4(m -|- 1).' 


BIAS[Qi(Y,m0]=nr'O 


Jk 

k=l 



(3.2) 


Idea of proof. Combining (A.2) and Proposition cf. Appendix A.l we get that 


ni 


BIAS[Qi(Y,mO] = nr'O 

\i=i 

The right-hand side of this equation is given explicitly in the aforementioned Appendix. □ 


Since the difference-based estimator of order I = 2 and gap (m + l), Qm+i(Y, W 2 ) cf. ( 2 . 2 ), 
can be represented equivalently with the weights 

m times m times 

nt2(m+i)-i-i = (do 0---0 di 0---0 ^ 2 ), 


where do + di -|- 0(2 = 0, Lemma tells us that the bias of such estimator is of order 
2(m -|- 1) max((ig, ^ 2 ) Jk/n. Similar calculations show that for I = 3, BIAS[Qm+i(Y, m 3 )] = 
(P(3(m -|- 1) Jk/n). That is, considering a high-order difference-based estimator (/ > 2) with 
gap m -|- 1 will increase the bias by a magnitude of order l{m 1), hence we restrict to the 
case / = 2 as our main focus is in bias minimizing estimators. 

3.2. Bias minimizer for autocovariance difference-based estimators of second or¬ 
der and gap m -|- 1. 

This section contains proofs for Theorems [T] and The auxiliary Lemmas 
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10 , 11 and 12 


can 


be found in Appendix |A. 2 [ 













Proof of Theorem\^ Let Um = n — 2{m + 1). In Eq. (2.5) write hfd) = dfd) + r]i{d), 
where dfd) = f-fi+m+i + d{fi+ 2 {m+i)-fi+m+i), Vi{d) = ei-e i+m+l “1“ ^(^i+2(m+l) 
i.e., 5i{d) carries the signal / while rjfd) carries the errors £. 

Since £\b‘l{d)] = dfi^d) + £[r}f{d)\, from Lemmas I and now follows that 

(m + l){d‘^ + 1) 


E[7j™V)] = 7o + qo{d) JK/rim, qo{d) = 


2 (d2 + d+ 1) 


(3.3) 


recall that Jk = (®i+i — cbjY- 

Set P{d) = 2{d? + d + 1). Since E\bf{d)] = 5f{d) + 6P{d) 6f{d)'jo + ^[vf{d)], we get from 
Lemmas [5] and |6l 


VAR(52(d)) = 4P(d)7o 6fd) + 2P\d)^l 

coy [bib]) = A5fd)5M)Ch{d)iiM)] + C[rif{dHm - P\d)^l 

Set A = di{d)5j{d)£[r]i{d)r]j{d)] and B = 2 X)*j C[rif{d)Tfj{d)]. Then 

VAR(7”V)) = (2n^PHdhl + 4(m + + l)P(d)7o * + A + B - <P"(<i)7„") 


which after an application of Lemmas from Appendix A.2 becomes 


= (pi(c(;7(-))7o^i^ + ^mP2(d;7(.))7o +P3 (c(;7(-))) > 


with 


2(m + l){d‘^ + 1) + ‘^Yl'h=i qlr\d) ph 
{d^ + d + iy 

P\d) + Y.tldK(d-,^i:))hl 

2(l + (i + (P)2 


Pi(d;i{-)) = 
P2(d-,%)) = 


(3.4) 

(3.5) 


and 


3m+2 


P3{d-,-f) = ^^/*(^;7(■))/7o^ 


(3.6) 


where ql^\d) = [2{m + 1) — 3h]{d‘^ + 1) + d'^h and A/i((i; 7 (.)) is given in Lemma ([^, see 
Appendix |A.2[ 

Now we consider = ^^\d) — 6d\ cf. (2.6). In light of the calculations above we 

only need to focus on 5d\ cf. (2.4), 1 < h < m < n. The arguments leading to E[ 7 Q™'^((i)], 


cf. (3.3), allow us to get that 


E[5d)] = 70 - 7, + 


h Jk 
2{n — h) 


(3.7) 
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Observe that a combination of (3.3)-(3.7) yields 


PoW =Po{i) - V2. 


(3.8) 


Next, combine (3.7)-(A.20)-(A.21) and Lemma 10, cf. Appendix A.2, to get that 

VAR(?l'->) = |2(n - A)|-^|Fi(7 (.)) Jk + 




The COV(7y^(rf),(5(^*) is given by (A.23) (see Lemmas |ll| and 
Appendix). Arranging all these terms we get that 


12 


(3.9) 

in the aforementioned 


Pi(c^;7(-)) =Pi(c^;7(-)) + 4 ^/i(7(-)) 


8(^2 - 1 ) 

P{d) 


Vhili.)). 


-U 1 IP s ^ 1 cW ^ 
ft(<i;7,.,)=Ps(d;7,.)) + j^S,„ + ^, 


=p3(d;7(.))72 + 

This concludes the proof. 


7o(7o - 7fe) 

P{d) • 


(3.10) 

(3.11) 

(3.12) 

□ 


Observe that neither the functions ibj(-), 14(-) or the quantities S 2 ^l{d), cf. (3.10) 


(3.11), depend on the signal / but only on the unknown autocovariance function 7 (.); also 
for any 1 < h < m, = 0{n) and S^l^{d) = C>(n), for any d G M. As a consequence 
of Theorem!^ we deduce that if Jk = o{^/n), then the estimates given by (2.12)-(2.14) are 
n-consistent. Hence we obtain the following as an immediate consequence. 


Theorem 6. Suppose that the assumptions of Theorem ^ hold. Let be given by 

(2.12)-(2.14). If Jk = o{^/n), then for h = 0,... ,m, 

B\AS[jl^\dh,m)] = o{n~^P) and \^^,^\dh,m) - 7^1 = Ov{n~^P). 

The remaining part of this section is devoted to the proof of Theorem In a nutshell 
this theorem establishes that if the error process in (1.1) is Gaussian, m-dependent and with 
autocovariance satisfying (2.13) then the extended bias of ^lf^\d), cf. (2.11), is minimized 
at d = 1. The same result holds when the correlation is non-negative (see lemma below). 

Proof of Theorem Since argmin^ggpo(d) = 1, cf. Theorem (|^, we only need to show 
that 

argminpi(d) = 1. (3.13) 

deK 

For the m-dependent processes fulhlling 1., cf. Theorem it suffices to note that p G 
(max{ —1, —8/(3m^) }, 1], and the validity of (3.13) follows by Lemma of Appendix A.2 


Lemma 1^ below shows the validity of (3.13) for the processes fulhlling 2. The proof is 
complete. □ 
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Lemma 2. Suppose that the assumptions of Theorem hold. Additionally, assume that 
the correlation function = 7 / 1/70 non-negative. Then Pi((i; 7 (.)) and BIAS*[ 79 ™^(d)] are 
minimized at d = 1 . 


Proof. It is convenient to keep in mind ( |3.4[ ) . For m = 1, apply Lemma of Appendix |A.2 
with p = ph > 0. For m > 1, since + l)/(d^ + d + 1)^ > 2/9 for all d G M, it suffices to 
show that, 

m 

Ph q^ff'\d) — > 0, for all d G M. (3-14) 


h=l 


Let pmin = min {pi,..., pm} > 0. Observe that 

m m 


> 0 , 


h=l 


h=l 


the latter follows again from Lemma with p = Pmin- This shows the validity of (3.14). □ 


4. On the yffi-CONSISTENCY OF DIFFERENCE-BASED AUTOCOVARIANCE ESTIMATES IN 
NONPARAMETRIC HOLDER REGRESSION WITH STATIONARY m-DEPENDENT ERRORS 


Thus far we have conducted a non-asymptotic analysis of bias and variance of the class of 
estimates (2.5)-(2.6) for Gaussian errors along with the condition (2.8). This led us to the 
bias-reducing estimates ^jf^\dh^m), cf. Eq. (2.12)-(2.14). Moreover, we showed that Jk = 
o{y/n) is a sufficient condition for the A/n-consistency oi^^f^\dh,m) in the segment regression 
model (l.l)-(2.1)-(2.8), see Theoremabove. In this section, we show that ^^j^\dh,m), are 
-y/n-consistent estimates of 7 / 1 , h = 0 ,..., m in a general regression model with possibly 
non-Gaussian, stationary, m-dependent errors. 

More precisely, we consider observations from the regression model introduced in (1.1) 
when the unknown signal / admits the representation 

/(^) = a;G[0,l), (4.1) 

j=0 

where Oj : [0,1) —)■ M are unknown Holder functions, i.e., for all x,y E [0,1) there exists a 
generic constant G > 0 and an index aj > 0 such that 


\aj(x) - a,( 9)1 <C\x- j = 0,..., A'„ - 1. 


(4.2) 


We will assume that / has effectively discontinuity points. Namely, let {tj := — 

aj+i(rj_|_i) be the j-th jump of /, we will assume that there exists a number c > 0 such that 

I'djI > c for j = 1,... , Kn — 1. Again, the change-points 0 = tq < ri < • • • < tk„-i < tk„ = 1 

and Kn are unknown and may depend on n now. We assume that the errors are a sample 
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from a zero mean m-dependent process with stationary moments up to order 4, {ei)i>i. The 
main result of this section is now stated. 


Theorem 7. Consider the model given by Eqs. (l.l)-(4.1)-(4.2). Suppose that Eg. (2.8) 


holds with K replaced by Kn- Assume that A* := maxj{|'dj|} < oo. Let be given 


by Eqs. (2.12)-(2.14). Then, for h = 0,... ,m 

Jm) 


K„ 


BIAS [7^(4,™)] = 0[— + Yl 


-2{aj+l/2) 


(4.3) 


i=i 


If additionally, Kn = o{y/n), 

\lt^\dh,m) -lh\ = Op(n“^/^). (4.4) 

Proof. The auxiliary results to this proof can be found in Appendix The validity of 


Eq. (4.3) follows from Lemma 13 Since a* := minj{aj} > 0, combining Lemmas 14 and 15 
we get that 

- 7(i)l ^ 0{Kn/Vn) 

VAR(v^i(7b'(4^„) - 7k)) = 0(Kl/n)) + 0(1), 


The validity of Eq. (4.4) now follows by using = o{^/n). 


□ 


5. Projection-based covariance matrix estimation 
In general, autocovariance estimates based on difference schemes may lead to a non¬ 


positive dehnite (ill-dehned) covariance matrix estimate in model (1.1). In order to overcome 
this problem in this section we propose a projection-based covariance estimation method. 
Assume that the vector Y = {yi ■ ■ ■ yn) G M” follows the model (1.1) with signal / G 4 


(some class of functions) and zero mean, stationary m-dependent errors e = (£i)i<i<n, and 
associated covariance matrix 


f 70 7i 
71 70 


pM — 


7m 


\ 


'fm 


'fm 


\ 


7m 


70 71 

71 70, 


(5.1) 


That is, T*'™'^ G Cn\ the set of all the nxn symmetric, positive semidehnite (m-|- l)-banded 
Toeplitz matrices. Observe that C, 


(m) 


is a subset of TC, the Hilbert space of all the n x n 
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symmetric matrices with inner product (A, B) = tr(y4^ B) and induced (Frobenius) norm 
il^ll-F = ^i,j^ A,B E'K. Let Sn and denote the subsets of d£ of all the positive 

semidehnite and (m + l)-banded Toeplitz matrices, respectively. Note that = SnATn^^ 
We dehne MSE(^_rM)[f] := ErM||r(”^) 


T\\l, where f e B, F^”^) e F 


(m) 


is given by (5.1), 

and F G FC, is some estimator. The following lemma is essential to get the main result of 
this section. 


Lemma 3. Let Q be a closed convex set of a Hilbert space “K with inner product (•, •) and 
induced norm ||'d|| = ('d,'d)^/^. Let Y ~ Pl ?0 where "do G 0 and let d ^ “K be an estimate of 
do- Let Pe{d) = argmin{||'d — 'd|| :'d G 0} be the unique projection of d onto 0. Then, 




(5.2) 


Proof. The well-known projection theorem, cf. |Luenberger| ( |l968| , p. 69, characterizes P 0 {d) 
by the condition that 

{d-Pe{d),d-Pe{d)) <0, WeQ. (5.3) 


Then observe that 


11^-411' = ||Pe(^)-^o|P+||(/-Pe)(^)|p-2(^9-P0(^),4-Pe(^)) > ||Pe(^9)-4ir (5.4) 


The latter follows by (5.3) since do G 0. 

(pl). 


The result follows by taking expectations in 

□ 


This general principle of convex optimization allows us to get well-dehned covariance 
estimates with reduced risk by properly projecting preliminary (and possibly ill-dehned) 
estimates onto More precisely: 


Theorem 8 . Let (70 (F^) 


7 m(F^)) be any estimate of the vector (70 


7 m) whose 


w.r.t. 


corresponding matrix is denoted by F and has the form (5.1). Let us define F* := Pc^iT) = 
argmin{||F — : F^”^^ G i.e., the unique projection o/F onto 

Then, 

MSE(^,P(,.))[r] < MSE(^,r(.))[f], for all (/,F(-)) G ^ x . 


Proof. Since is the intersection of {Sn and Tn^'^) two closed convex sets of df, the result 
follows by an application of Lemma □ 


Remark 2. The validity of this result does not depend on any distributional assumption 
about the errors £ neither on any specihc form for the family of signals P. Observe also 
that Theorem holds true for any 6 -banded Toeplitz matrix for 1 < 6 < n — 1 provided b is 
hxed. Now, we show how to compute F*, that is, the nearest symmetric, positive semidehnite 
banded Toeplitz matrix to a given covariance matrix estimate. 
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5.1. Alternating projections method. 

In this subsection we utilize the notation introduced in Theorem The representation of 
as the intersection of Sn and suggests an alternating projection algorithm for the 
computation of T*: in order to compute T* we have to project iteratively onto Sn and then 
onto by repeating the operation 

T ■<—(P 5 (T)). (5.5) 

Let Aj be the i-th eigenvalue of T. The spectral decomposition of T = QDQ^, where 
D = diag(Ai) and Q is an orthogonal matrix containing the eigenvectors of T gives us 

-P-s„(r) = Qdiag(max(Aj,0)) 


see e.g., Theorem 3.2 of Higham (2002). 

It is well-known that P_{m)(r), the orthogonal projection of T onto Tn'^\ is given by the 

' n 

n X n symmetric, banded Toeplitz matrix whose hrst row is given by 

n—k 


tk , ^ ^ 'li, 

n — k 




= 0,..., n — 1, 


i=l 


see e.g., Eqs. (2.3)-(2.5) of Grigoriadis et ah| (1994). 

Since Sn is not a linear subspace, the alternating projection algorithm (5.5) requires a 
modihcation for it to converge. Such a modihcation is due to Dykstra (1983) which combines 
a benehcial correction to each projection which can be seen as a normal vector (in a geometric 
sense) to the corresponding convex set. 


Algorithm 1. Given a symmetric matrix Sq G this algorithm computes the nearest 

Toeplitz covariance matrix to So in the Frobenius norm. 


DCo = 0,Po = So 

for fc = 1, 2 ,... 


Rk = Pk -1 — DCk- 1 , %DCk-i is Dykstra’s correction. 

Xk = PSniRk) 

DCk = Xk — Rk 

Pk = P-rirn){Xk) 

' n 


end 


The function nearPDToeplitz, in the R package dbacf, implements this algorithm. Ac¬ 
cording to Theorem 2 of Boyle and Dykstra (1986) the sequence P^, k = 0,1,..., converges 
to P„(™)(S'o), the orthogonal projection of the initial point S'o onto the closed convex set of 

symmetric, positive semidehnite, (m -|- l)-banded Toeplitz matrices. 
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6. Simulations 


This section contains 2 simulation studies. The first one assesses the performance of (2.12) 


(2.14) for autocorrelation estimation and their robustness against non-normally distributed 


errors. In the second study we assess the performance of these estimates when the signal is 
a smooth function. 

In our simulations for the noise we consider a 1-dependent error model: Si = -|- 

ri5i-i where 5iS are i.i. P-distributed, tq = [-y /1 -l- 271 -|- a/1 — 271 J /2 and ri = [a/1 -|- 271 — 
VT^^'2^]/2 for —1/2 < 7 i < 1/2; we will utilize V = A/'(0,1) and to assess robustness 
against non-normality we will use V = Since the autocorrelation of Si at lag 1 satisfies 
Pi = 7i, we will assess estimation of pi and p 2 = 0. 

In this section and in the upcoming Section we will use po to denote autocorrelation 


estimates based on the autocovariance bias-optimized estimates (2.12)-(2.14), the symbols 


Ph, Pm, Pr, Pp, Phv will denote Herrmann et al. (1992)’s, Miiller and Stadtmiiller (1988)’s, 


adapted-Rice (1984)’s (set d = 0 in (2.12)), Park et al. (2006) and Hall and Van Keilegom 


(2003)’s autocorrelation estimates, respectively. Hall and Van Keilegom (2003)’s estimators 


depend on two smoothing parameters, h and I 2 , following Section 3 of their paper we have 
chosen /i = and I 2 = y/n in our simulations. Recall that except for 'pnv, fhe other 
difference-based estimates require m as a parameter, we have used m = 2 in each case. 

6.1. Piecewise constant signal with 1-dependent errors. 

The simulated signal in this study is an adaptation of that of Chakar et al.| (2016). More 
precisely, the signal is a piecewise constant function, /, with 6 change-points located at 
fractions 1/6 ± 1/36, 3/6 ± 2/36, 5/6 ± 3/36 of the sample size n. In the first segment / = 0, 
in the second one / = 10 , in the remaining segments / alternates between 0 and 1 , starting 
with / = 0 in the third segment. 

Tables [T]j^ summarize the results for n = 1 600 obtained from 500 samples for Gaussian 
and G errors, respectively. Each table displays the MSE of po, Ph, Pm, Pr and pHv- Table [T] 
shows that 'po overperfoms all the other estimates under comparison for all the values of 71 ; 
note that for positive values of 71 , 'po beats pm, Pr and 'pnv up to 2 orders of magnitude. 
Since 'po and 'pn use the same statistic to estimate 70 , it is expected that 'po only overperfoms 
'Ph marginally. Similar results are obtained when the errors are heavy-tailed (t 4 -distributed), 
see Table That is, the performance of 'po does not seem to depend on the Gaussianity of 
the errors, as also suggested by Theorem 


6.2. Smooth signal with 1-dependent errors. 

The purpose of this study is to assess the performance of 'po for smooth signals when the 


bias of our estimators becomes less influential. To this end, and following Park et al. (2006), 
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Table 1. The MSE of various autocorrelation estimators of pi = 71 and /O 2 = 0 under the 
1-dependent error model = r^di + ri5i-i where 5iS are i.i.d. A/'(0,1), tq = [-^1 -I- 271 -|- 
y/l — 27 i ]/2 and ri = [y/1 + 271 — -y/1 — 271 J/ 2 , based on 500 pseudo-samples of size 200. 


Signal is specified in Section 6.1 




Po 

Ph 

Pm 

Pr 

Phv 



7i = 

- 0.5 








Pi 

= 7i 

0.0147 

0.0162 

0.0357 

0.0359 

0.8749 



P2 

= 0 

0.0035 

0.0037 

0.0049 

0.0049 

0.3520 



7i = 

- 0.4 








Pi 

= 7i 

0.0132 

0.0144 

0.0311 

0.0313 

0.7612 



P2 

= 0 

0.0033 

0.0036 

0.0048 

0.0049 

0.3522 



7i = 

- 0.2 








Pi 

= 7i 

0.0077 

0.0089 

0.0198 

0.0199 

0.5526 



P2 

= 0 

0.0023 

0.0024 

0.0036 

0.0037 

0.3499 



7i 

= 0 








Pi 

= 7i 

0.0049 

0.0057 

0.0126 

0.0126 

0.3801 



P2 

= 0 

0.0016 

0.0022 

0.0038 

0.0038 

0.3520 



7i 

= 0.2 








Pi 

= 7i 

0.0023 

0.0028 

0.0062 

0.0062 

0.2377 



P2 

= 0 

0.0013 

0.0018 

0.0034 

0.0034 

0.3505 



7i 

= 0.4 








Pi 

= 7i 

0.0008 

0.0010 

0.0022 

0.0022 

0.1296 



P2 

= 0 

0.0010 

0.0015 

0.0033 

0.0033 

0.3512 



7i 

= 0.5 








Pi 

= 7i 

0.0006 

0.0007 

0.0011 

0.0011 

0.0885 



P2 

= 0 

0.0011 

0.0018 

0.0036 

0.0036 

0.3538 



Notation: 70 , 7 //, 7 m, 7/j, 7 My denote optimized difference-based. 


Herrmann et al. 


(1992)’s, 


Muller and Stadtmuller| ( 1988 1 ’s, adapted-Rice (19841’s and Hall and Van Keilegom (2003)’s 


estimates, respectively. 


we consider the mean fnnction f{x) = 300a;^(l — l[o,i](a;) sampled at points Xi = i/n. 

As errors we nse the 1-dependent process (Sj) introdnced above with V = A/'(0,1), see 
Section 16.1[ 

Park’s estimation method consists of two stages. First, an optimized bimodal kernel 
method pre-hlters the signal, the resnlting residnals are then nsed to estimate the corre¬ 


lation strnctnre via an ordinary difference-based method. Since onr example follows Park 


et al. (2006)’ specihcations, we nse their reported resnlts and we apply the difference-based 


methods abovementioned for antocorrelation estimation of pi and p 2 = 0 . 

Table snmmarizes the resnlts obtained from 500 samples of size n = 200. According to 

this table, pp and 'pnv are almost identical and these two estimates overperfom all the others 

in almost all the cases. When the correlation is non-negative, 'pnv is only marginally better 

than 'po, Pm and pr. We conclnde that for smooth signals, antocovariance estimates based 

only on difference schemes {pm, Pr, Phv) are comparably as accnrate as other estimates 

which rely on kernel-based residnals, for highly positively correlated noise (71 = 0.5) even 

the bias-optimized estimate po ontperforms snch estimates. 
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Table 2. The MSE of various autocorrelation estimators of pi = 71 and p 2 = 0 under 
the 1-dependent error model Si = r^Si + riSi-i where i5i’s are i.i.d. t 4 , rg = [-^1 -I- 271 -|- 
•vT^-^^]/2 and ri = + 271 — -^1 — 27 i]/ 2 , based on 500 pseudo-samples of size n = 


1600. Signal is specified in Section 6.1 



Po 1 Ph 

Pm 

Pr 

PHV 

7i = -0.5 






Pi = 7i 

0.0069 

0.0063 

0.0119 

0.0120 

0.4620 

Pa = 0 

0.0038 

0.0037 

0.0032 

0.0033 

0.1863 

II 

1 

0 






Pi = 7i 

0.0059 

0.0056 

0.0104 

0.0104 

0.4042 

Pa = 0 

0.0035 

0.0032 

0.0031 

0.0031 

0.1868 

7i = -0.2 






Pi = 7i 

0.0042 

0.0042 

0.0074 

0.0075 

0.2968 

Pa = 0 

0.0023 

0.0021 

0.0023 

0.0023 

0.1879 

7i = 0 






Pi = 7i 

0.0035 

0.0035 

0.0047 

0.0047 

0.2018 

to 

II 

0 

0.0022 

0.0018 

0.0021 

0.0021 

0.1870 

7i = 0.2 






Pi = 7i 

0.0015 

0.0016 

0.0025 

0.0025 

0.1272 

to 

II 

0 

0.0014 

0.0013 

0.0016 

0.0016 

0.1873 

0 

II 






Pi = 7i 

0.0008 

0.0009 

0.0010 

0.0011 

0.0690 

Pa = 0 

0.0011 

0.0010 

0.0014 

0.0014 

0.1870 

71 = 0.5 






Pi = 7i 

0.0005 

0.0006 

0.0006 

0.0006 

0.0463 

to 

II 

0 

0.0010 

0.0011 

0.0014 

0.0014 

0.1849 


Notation: 70 , Jh, 7m, 7/j, ^hv denote optimized difference-based. 


Herrmann et al. 


(1992)’s, 


Muller and Stadtmuller[ ( 1988 1 ’s, adapted-Rice (19841’s and Hall and Van Keilegom (2003)’s 


estimates, respectively. 


7. Applications 


7.1. Dependency of ion-channel recordings: Gramicidin A. 

Ion channels are proteins regulating the flow of ions across the cell membrane by random 
opening and closing of pores. Typical experiments such as patch-clamp recording would 
move an electrode close to an ion channel allowing that electrical currents flowing through 
the channel can be measured. In this section we consider recordings of 60 secs of gramicidin 
A; this ion channel trace was recorded at a sampling rate of lOkHz and digitized with a 
low-pass 4-pole Bessel Alter at a cut-off frequency of IkHz, see Figure |1A| and |Hotz et ah 


(2013) for further details. 


Common ion channel traces resemble realizations of model (1.1) where the error model 


is now the convolution between the (discrete) kernel of the low-pass 4-pole Bessel Alter 
and realizations of i.i.d. normal random variables with zero mean and variance a^. That 
is, in this case we can compute the theoretical correlation function of our observations. 


Indeed, we utilize the function dfliter from the CRAN package stepR (Hotz and Sieling 
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Table 3. The MSE of various autocorrelation estimators of pi = 71 and /O 2 = 0 under the 
1-dependent error model = r^di + ri5i-i where 5iS are i.i.d. A/'(0,1), tq = [-^1 -I- 271 -|- 
y/l — 27 i ]/2 and n = [y/1 + 271 — -y/1 — 271 J/ 2 , based on 500 pseudo-samples of size 200. 


Signal is specified in Section 6.2 



Po 

Ph 

Pm 

Pi? 

PP 

Phv 

7i = 

-0.5 







Pi 

= 7i 

0.0318 

0.0210 

0.0161 

0.0168 

0.0029 

0.0071 

P2 

= 0 

0.0392 

0.0314 

0.0264 

0.0263 

0.0076 

0.0077 

7i = 

-0.4 







Pi 

= 7i 

0.0268 

0.0182 

0.0141 

0.0147 

0.0038 

0.0073 

P2 

= 0 

0.0314 

0.0264 

0.0205 

0.0207 

0.0065 

0.0073 

7i = 

-0.2 







Pi 

= 7i 

0.0220 

0.0171 

0.0133 

0.0133 

0.0062 

0.0084 

P2 

= 0 

0.0221 

0.0181 

0.0144 

0.0145 

0.0060 

0.0066 

71 

= 0 







Pi 

= 7i 

0.0156 

0.0128 

0.0097 

0.0098 

0.0064 

0.0069 

P2 

= 0 

0.0164 

0.0133 

0.0103 

0.0104 

0.0067 

0.0067 

7i 

= 0.2 







Pi 

= 7i 

0.0125 

0.0112 

0.0083 

0.0082 

0.0065 

0.0054 

P2 

= 0 

0.0122 

0.0094 

0.0074 

0.0074 

0.0081 

0.0080 

7i 

= 0.4 







Pi 

= 71 

0.0067 

0.0064 

0.0043 

0.0044 

0.0049 

0.0035 

P2 

= 0 

0.0089 

0.0070 

0.0048 

0.0051 

0.0109 

0.0091 

7i 

= 0.5 







Pi 

= 7i 

0.0051 

0.0052 

0.0034 

0.0035 

0.0589 

0.0029 

P 2 

= 0 

0.0097 

0.0079 

0.0054 

0.0058 

0.1059 

0.0096 


Notation: po, pn , Pm -, Pr , Pp , 'phv denote optimized difference-based, 


Herrmann et al. 


(|1992 )’s, Muller and Stadtmiiller (1988)’s, adapted-Rice (1984)’s, Park et al. (2006)’s and 


Hall and Van Keilegom (2003rs estimates, respectively. 


(2015)), and calculate the theoretical correlation between the observations of Figure lA 


Additionally, we calculate the correlation estimates (po) based on (|2.12[)-([2T4|)), [Herrmann 


et al. (1992)’s (pip), Muller and Stadtmiiller ( jl988 )’s (pm), adapted- |Rice| ( [l984l )’s (pij, set 


d = 0 in (2.12)) and Hall and Van Keilegom (2003)’s {'pnv) and compare them with the 
theoretical correlation. 

Figure displays all these difference-based estimates for m = 6, 8,10,12. Except for 'pHv 
the other estimates show minor quantitative differences between them and to some extent 
they are close to the true correlation. In particular, for m = 6, the proximity of po with the 
theoretical correlation function is remarkable. 

The main goal of this section is to explore an application of our autocovariance estima¬ 
tion techniques in a changepoint problem. To this end, we introduce a method for JUmp 
Segmentation of (short-range) Depedent data (JUSD). This method combines the multiscale 
changepoint estimator introduced by Frick et al.| (2014) with the autocovariance estimates 
proposed in this paper. 
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(A) (B) 



(C) 


(D) 


Figure 2. Correlation estimators from ion channel recordings shown in Figure 
(‘^); PO) optimized difference-based (♦); Herrmann et al. (1992)’s (*); 


lA 


and Stadtmiiller 

(19881 

(2003 

I’s (A). 


true 


Muller 


Phv^ Hall and Van Keilegom 


7.2. Jump segmentation for dependent data. 

Now we briefly describe JUSD. Let Y denote the vector of observations Vi = fi + Si 

following (1.1) and for simplicity suppose that the errors Si are Gaussian distributed. The 

multiscale approach to detect a change-point given in Eqs. (3)-(4) of Frick et al. (2014) 

suggests to calculate all local likelihood ratio test (LRT) statistics T/(Y, fy), I < i < j < n, 

and to use the maximum over these statistics as the hnal test statistic, say T„. Here 77 
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denotes the hypothesis that / takes a hxed value on the interval [i/n,j/n\. Then, estimates 
and conhdence regions for the change-points of / can be derived from the distribution of T„, 
which can e.g. be obtained via simulations, see Frick et ah (2014) for the independent case. 

In the current situation of m-dependent errors, for each i and j the local LRT depends 
also on the inverse of the {j — i + 1) x {j — i + 1) covariance Toeplitz matrix, say 
In the worst case scenario, the dimension of such matrices is and in order to avoid the 
expensive calculation of we propose a more tractable multiscale statistic. We re¬ 

weight the partial sums of the observations yi according to the dependency. More precisely, 
let S'/ = yi + ■ ■ ■ + yj, i < j, and consider the modihed local statistics based on partial sums 

TiiY- fn) = \Si - /^|VVAR(^/), l<i< 3 <n. 

Now our proposal (JUSD) consists of utilizing our difference-based estimates 70 to estimate 
VAR(S'/) = (j — i-|-1) 7 o -|-2 — i- 1 -1 — A;)+ 7 fc, here a:+= max(0, x). The corresponding 

modihed version of T„ is denoted by T^. Under the true error model, Monte Carlo simulations 
allow us then to determine the hnite-sample distribution of T„, and the subsequent estimation 
of the number of changepoints, locations and levels of / is carried out as in Eqs.(5)-(6) of 


Frick et ah (2014). 


For the simulation design of this section we use the signal introduced in Section |6.1 and a 
6 -dependent Gaussian process for the error model. This error model corresponds to the esti¬ 
mated dependence structure of the ion channels introduced in Section [7T| see Figure [ 2 X| To 
simplify matters in the Monte Carlo experiment to determine the hnite sample distribution 
of Tn, we will use an equivalent MA( 6 ) representation for the error model. The coefficients 
of this MA( 6 ) model are obtained via an application of the Durbin-Levinson algorithm 
(Proposition 5.2.1 of Brockwell and Davis (2006)) to the sequence h = 0,1,..., 6 , 


cf. ( [2T^ -( [2T4l ). 

This Monte Carlo experiment is performed only once and its output is used to assess 
the performance of JUSD in estimating the signal from 500 samples of size n = 1600. For 
comparison, we have also utilized the R function smuceR which was designed for independent 
errors by Hotz and Sieling (2015) to estimate changepoint locations and levels in the current 
simulation. In each simulation the signihcance level of the estimated number of change 


points was set to a = 0.05 in both methods. Figure |3A| displays the £t of JUSD and 
SMUCE on one realization of the yiS and Figure [3B] shows the same hts on a small part of 
this simulated dataset. Unlike JUSD, SMUCE does not take into account the correlation 
between observations and this leads to a clear overestimation of the number of changepoints, 
see Figure |3Cl 
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222 711 888 1200 1466 


(A) 



(B) 


(C) 


Figure 3. (A) (Gray background) An example of observations y^’s as described in 
(^—) True piecewise constant signal as described in Section 6.1 


Section 7.2 


JUmp Segmentation for Dependent data as introduced in Section 7.2 (■■■■) Simultaneous 
MUltiscale Changepoint Estimator as presented by Frick et al. (2014). (B) Fit of JUSD 
and SMUCE on data shown in (A) restricted to the interval [711,888). (C) Boxplot of 
estimated number of changepoints by JUSD and SMUCE; true number of change-points is 

6 . 
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Appendix A. Proofs and auxiliary results eor Section [3] 

Thronghont this snpplementary materials we use the following notation: Um '■= n — 2{m + 
1 ), Hh ■= n — h] fi denotes f{xi), v^i+h) denotes the vector {viVi+i ■ ■ ■ Vi+h)~^ G (we 

use this notation with v G {y, f, e:}), (a, b) denotes the inner product between the vectors a 
and b, 1 denotes the vector of ones, and for x E ||a:|| denotes its Euclidean norm. 


A.l. Proofs for Section 13.11 

We begin with some preliminary results. For I < n, let Qi(Y,Wi) be a difference-based 
estimator of order I and gap 1, cf. Eq. (2.2). With D as dehned in (3.1) dehne the (/ -|- 2) x 
(/ -|- 2) matrix D := D. Observe that the identity 
i+i 

{doVi + diVi+i + --- + di Vi+if = D i/yQ+z+i), j <n-2l-3, 

i=j 

implies that 


ni 


2niP{wi)Qi{Y,wi) = {wi,yi,(^i+i)f + YyJ.(j+i+i)^yj--U+i+^) + (A-l) 

1=1 

where P{wi) = (^- 1 ) it is not difficult to see that E[{wi,yk:{k+i)y] = 

o{n) for fc = l,n - / and that for j < n - I, D yj.^+i+i)] = 

^Y-i^j+h+i) ^ ^j-{ 3 +h+i)]- Next, we combine this with Proposition and get that, 

E[P{wi)Cl,{Y,Wi)] = wJ Sz+i ru; -I- Yj ll-^/nd+i+i) IP+ 0(1), (A.2) 

^ 1=1 

where Sj+i is the (Z + 1 ) x (/ + 1 ) autocovariance matrix Sj+i = ( 7 |j_j|)^ ^. 
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Proof of Theorem It suffices to consider the difference-based estimator of order I < n 
and gap 1, Qi(h", Wi). For I < m, due to Lemma ae ^ becomes 


E [Piw,)Q,{Y, n;,)] = Bi + o[n-^ Jk 

k=l \j=k 


(A.3) 


where 


Bi — wj S;_|_i Wi — 7o(do + ■ ■ ■ + df) 2 'yi[d(}di di-idi) 2 'yid(}di. (A.4) 

From now on in this proof we assume that Jk = o{n) and disregard the second summand 


in the right-hand side of (A.3). Since constraint (lU/, 1) = 0 implies that at least one of 
the weights dj is nonzero, for simplicity from now on we assume that do 0. According to 
Lemma 1^ in this case there does not exist any constant c such that Bi = cyo, that is, no 
difference-based estimate of the form Qi(F", wi) can be an asymptotically unbiased estimate 
for 7o for I < m. 

Next, suppose that m < I < n. In what follows, for simplicity, let ns assume that 
n = N{g + 1) for some integer A^ > 1 and g = m. Observe that in this case Proposition]^ 


and (A.3) yield, E\p{wi)Q,i{Y,Wi)] = Bi + o(l). Due to m-dependency, the covariance 
matrix appearing in Bi is an (m -|- l)-banded Toeplitz matrix, i.e., the (*, j) entry of 
is given by 7p-j| 7^ 0 if |j — i| < m, and outside the (m -|- l)-diagonal the entries of 
are equal to 0. Suppose that wi has entries do = 1 and dj 7^ 0, for j = k{g + 1) with 
k = 1 ,..., A^ — 1, and dj = 0 otherwise. Clearly, Bi = P{wi) 70. Now we show that any 
asymptotically unbiased estimate for 70 is necessarily of the form just described. Indeed, 
it suffices to consider the vector wf whose entries are identical to those of Wi except for 
df; 7^ 0 for some k G {!,... ,m}. In this case, due to the form of the covariance matrix 
E;_|_i, = ci7o-|-2 df, 7f,, for some constant Ci. Since 7^ 7^ 0, no difference-based 

estimate of the form Qi(D, can be an asymptotically unbiased estimate for 79. 

Note that the arguments above hold also for g > m. Thus, we have shown that for 
Qi{Y,wi) to be an (asymptotically unbiased) estimate for 70, the vector of weights wi 


must have the form Wi = (uq Vi 


vk-i)\ where Vi = {di.g 0 


0)^ e W+\ 


i = 0,..., A^ — 1; do 7^ 0, dk-g 7^ 0 for some 1 < fc < A^ — 1, and ^i-9 ~ This 

completes the proof. □ 

Lemma 4. Suppose that the conditions of Theorem^hold. Let Qi(F, lU;) be the difference- 


based estimator of order I < n and gap 1 , cf. (2.2). Then for I < m, there does not exist any 
constant c 7^ 0 such that Bi = wJ wi = cjo, where E;+i is defined by (A.2). 


Proof. We use induction over 1 . For I = 1 , Bi = 7o(do -|- df) -|- 27 idodi, cf. ( |A.4[ ). Since by 
assumption 71 7^ 0 (and do 7^ 0), di is necessarily equal to zero but this implies that 70 = 0 
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to fulfill (2.3). This contradiction shows that the claim holds for h = 1. Then we assume 


that the claim holds for I = k < m — 1. Let k = k + 1 and now note that for Bj, = C 70 to 
hold for some c 7 ^ 0, necessarily %dodj^ = 0. Since 7 ^ 7 ^ 0 and do 7 ^ 0, necessarily d^ = 0. 
The latter shows that B^, = B^. Hence, the claim holds for I = k + 1 and this completes the 
proof. □ 


Proof of Lemma For j < n—l dehne S 77 = {wi, and note that \\DfJ. 


i:(i+/+l)l 


see (3.1). In what follows we only consider sj^i since Sj+i,; can be handled similarly. 
Under the convention tk = /(x*) = if and only if tk_i <i<tk- Then 


ni 


K-l 

i=i 


i=0 


ti+i—l—l U+i —1 


(A.5) 


Then note that for any i G {0,..., iF — 1}, 

ti+i—i—i 

Sj,i = af{wi,l)^{ti+i-ti-l-1) 

and utilizing that do + ■ ■ ■ + di = 0, 

ti+i—l u+i —1 i 

^ ^ ^j,i ~ ^ ^ i^ofj + • • ■ + difj^i) = (a* — Oj+i) ^ ^ I ^ ^ dj 


j=U+i—l 


j—i-i+i — l 


k=l \j=k 


Substituting these expressions into (A.5) we obtain the right-hand side of (3.2). This com¬ 
pletes the proof. □ 


Proposition 1. For I < n, let Wi G be the vector of weights in Qi(U, lU;), cf. (2.2) 
Let D = D where D is defined by (3.1). Then 

E[^y(j+i+i) D = 2 wj Si+i wi. 


Proof. Let S;+i be the matrix dehned in ( |A.2 ) and note that 

D ej,Q+i+i)] = trjZlSi+s} = tr{5 5^}. 
Then, observe that the 2x2 matrix D S ;+2 D"' can be written as: 




J'Fi+i (?nz,7p+i):i) \ (wi 0 

0 wi 


j^/,7(Z+l):l) 'wJ ^l+l 

where 7(z+i):i := (7^+1 7z ■■■ 72 71)''' ^ A straightforward calculation yields, 

tT{DT,i+2D^} =‘iwj Ti+iWi. □ 
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A. 2. Proofs for Section 13.21 

We will assume that the conditions of Theorem hold. Also we will use the following 
notation: Xhifi) ■= fi - fi+h, for d e R, 8i{d) := So{i) + ds 2 {m+i){i) where for fc > 0, 
• fi+k fi+m+ly ^i+m+1 T *^(^i+2(m+l) ^i+m+l)) ^^t P{(£) ‘2{d T d T 1) 

and for given r*, Tj G [0,1], 1 < L, i? < n, 

:= [t - L/n, Tj - R/n). (A.6) 

Also, := Under the convention tj = [nr^J, we will denote := [U — L,tj — R) 

and use that i/n E if and only if i G without further mention. 

/ ‘ ii ^ j 

The following identities are of great use in what follows: for any integers r, s, u and n, 


E[el Eu £v] 

E[ 6 :^ Eg Ell 


7o + 2 7|r-s| 

7o7|'U—1>| T ‘^'~)\r—u\X\r—v\ 

^\r—s\1\u—v\ T y|)—T y|r—tilT'Is—n|) 


cf. Theorem 3.1 of Triantafyllopoulos (2003). 


(A.7) 

(A.8) 

(A.9) 


Lemma 5. Let i > 1, I > 1, Q < h < {m + 1) and define Eij^h = do^j + diEi^h + d 2 Ei^ 2 h + 
- V diEi+ih- Then, 

i-i i 

= 7o(do + d^ + d2 H-h df) + 2 ^ ^ dj dk fij-k] h 

j=0 k=j+l 


Proof. Write = Aij^h + where 

i i-i i 

^ ^ djEj^_^jy,, Bilk ^ ^ Xj(^i,l,h^ djEi^jk ^ ^ dkEi^kh- (A. 10 ) 

j=0 j=0 k=j+l 

The result follows by noticing that due to stationarity, for any integers i, j, k and h, = 

7o and E[£'j_|_j/j£:j_|_fc/j] 'y\j—k\h- fo 

Corollary 1. For / = 2, do = 1, di = —(d + 1) and d 2 = d, Elrfi^d)] = 2(d^ + d + 1) 70 . 

Lemma 6. Fori>l,l> I, E[Eli^^^fi = 37 ^ (d^ + d? + ■ ■ ■ + dff. 


Proof. From ( |A.10D , + AAij^m+iBij^^+i + Note that (jA^ 

implies that E[Aij^rn+i Bij^rn+i] = 0. That is, 

^[Bll,m+l] = E[A7,m+l] +^^[Bll,m+l]- 
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(A.ll) 









Eq. ( |A.7D yields that for any integers i, j and k, E[ei+j(^jn+i)] = and E[ei+j(ra+i) £i+fc(m+i)] = 
7 q. Conseqnently, 


= 7oq 3 5 ; + 2 ^ 

i =0 k,j 


(A.12) 


Next, Eq. (A.9) implies that for r ^ s, E[xr{i,l,m + l)xs{i,l,m + 1)] = 0, see (A.10) for 
dehnition of X(^.){i,l,m + 1). Hence, 


+ 1 )] = 70 4 dr 

j=0 k,j 


(A.13) 


The last eqnality follows from (A.7). The resnlt follows by plngging A .12 and A.13 into 
lAdTl □ 


Lemma 7. Suppose that the conditions of Theorem hold. Then, 

A. Tl-in 1 TLrn. 


di{d)6j{d)E[r]i{d)r]j{d)] 

i=l 


-d{d + l)^(m + 1) + ^ (lh^\d)ph 


h=l 


7o Jk, 
(A.14) 


where q'f^\d) = [ 2 (m + 1 ) — 3h](d^ + 1 ) + d^h and ph = Jh/'Jo- 


Proof. For d G M, let Sr{d) = di{d)5i+r{d), 1 < r < Um- First observe that dne to 

stationarity, for any i > I and h > 1 , T(h, d) := E[qi{d) pi+h{d)] = 2 (d^ + d + 1 ) 7 ^ — (d + 
l)^ 7 |/i_(m+i)| + d 7 |/j_ 2 (m+i)|. Then dne to m-dependency, ^{h,d) = 0 for all h > 3m + 2. 
Conseqnently, we can write 


3m+2 


'‘'('■f'S'- 


(A.15) 


r=l 


Next, we present the details on how to compnte S'o(d). Utilizing (2.8) and (A. 6 ), we can 
show that for given tj, 

= i^ + ^)d^iaj+i - ai)^ = i^ + l)(«i+i - «i)^- 

, 2 (m+l),m+l jm + 1,0 

-.K-l 


Observe that S'o(d) = y'.^n ,jm+i,o 5f(d) = (m + l)(d^ + 1) .Jr- The key part 

J ^ XiEziTj 

in obtaining the snmmation Sr{d), r > 1 , consists of splitting it as shown above (and nsing 


(2.8)-(A.6)). Thns, we can show that Sr{d) = Tr{d) .Jr where 

(m + 1 — r)d^ + rd + (m + 1 — r) for r = 0 ,..., m 

Tr{d) = ■^ d ( 2 (m + 1 ) — r) for r = m + 1 ,..., 2m + 1 • (A.16) 

0 for r > 2{m + 1 ) 
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In order to get (A.14), substitute (A.16) into (A.15) and arrange terms. This completes the 


proof. 


□ 


Lemma 8 . Suppose that the conditions of Theorem\^hold. Then 

n-m-l rim 3m+2 

B := 2 ^ ElViid) r]‘^{d)] = 8 n^ {d‘^ + d+ ifil + 2 ^ - r) A^(d) > 0. (A.17) 


2=1 j=i+l 


r=l 


where 


K{d] 7(.)) = 8 + d + 1)^7^ + 2(1 + d)^7p;,_(^+i)| + ^2,^^l\h-2{m+i)\ 

— 4(1 + d) [(1 + d)^ + (d^ + d^ + d + l)]7/i 'y\h-{m+i)\ 

- 4d(d + l)^7|,j — (m+l)| 7|fe—2(m+l)|- 


(A.18) 


Proof. Straightforward calculations and Eqs. (A.7), (A. 8 ) and (A.9) yield that for i > 1 
and r > 0 , 

^[Vi{d) Vi+rid)] = 4 (d^ + d + 1)^72 + A^(d). 

Arrange terms and use m-dependency to get 

n-m-l rim-r 

Y Y1 E[hs(d)7sV(^)]- 


B 

~2 


r=l s=l 

The result now follows by noticing that Ar{d) = 0 for all r > 3(m + 1). 


□ 


Lemma 9. Suppose that the assumptions of Theorem\^hold. Additionally, assume that the 
correlation function ph = 'jh/'jo satisfies that ph = p & (max{ —1, —8/(3m^) }, 1), 1 < h < 
m. Then pi(d; 7 (.)) and BIAS* [ 70 ™^ (d)] are minimized at d = 1. 

Proof. Since BIAS[ 7 Q™'^(d)] is minimized at d = 1, cf. Theorem]^ we only need to focus on 
minimizing pi(-; 7 (.)). Let Q{d) = d^ + d + 1. It is easily seen that 


Pi{d;%)) = 


m + 1 

wy 


2 (d^ + 1 ) + rn{d‘^ + d^ + 1 ) ^ ph 


h=l 


For p = 0 (indepent observations) the result follows since argminj^g]g(d^ + 1)/Q‘^{d) = 1. For 
p > 0, we have that argmin^gR(d"‘ + df + 1)/Q‘^{d) = 1 and hence for d G M, pi(d; 7 (.)) > 
Pi(l; 7 (.)). For p < 0, note that 

d ,, , 2(m + l)(d^ — l)(d^(pm^ + 2) + d(pm^ + 4) + pm^ + 2) 

=-07)5-' 

It is immediate that on M, the critical points of pi are -1 and 1. For p G (—8/(3m^), 0), 

^Pi(—1;7(.)) = —Apm^ > 0 and J^pi(d; 7 (.)) = 4(9pm^ + 24)/81 > 0, i.e., both critical 

points are minima. The result follows by noting that pi(l; p) = Pi(—1; p)/9. □ 
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The following auxiliary results are used in the proof of Theorem^ Recall that is the 


ordinary difference-based estimator of gap h, cf. (2.4). Define 

n—h K—1 




i=l 


*=o 


We can show that 

(2(n - h)y E[(?("))2] = hVi + AhMn - h){^o - Ih) + E[C,] + E[D,]. 


For E[C/i] observe that due to Eqs. (A.7)-(A.8)-(A.9), 

E[(e:i - Eij^hf [Sj - Ej+hf] = 4(70 - -fhf + •di(7 j) -1- ^2{h3) 

with 

^i(b j) = const. 75 -i+.|, ^ 2 {id) = ^ const. 

S,t 

where s,t E {0, ±h}. That is, 

E[C,,] = [2{n - h)] 2 ( 7 o - 

where +i^2ihM note that = C’(n). 


(A. 20 ) 


(A. 21 ) 


Lemma 10. Suppose that the conditions of Theorem^ hold. Let D/^ be defined by (A.19). 
Then, E[Dft] = Fh{'j{.)) Jr where Fi = 2(70 — 71 ) and for 2 < h < m 


Fhili-)) = 2 


h j+l 


(h l)(7o 7fe) + y (27|j-^| 7|j-i-/i| 7|i-i+/i|) 

j=2 i=l 


(A. 22 ) 


Proof. For h = 1 the result follows by noting that for any ti, = { fj — 1}. For 2 < h < m 
note that 

K-l K-2K-1 

d-EE ((Z 2 (Z 2 -I- 1 ) ^k-\-h) “ 1 “ EE(“ s Ct^_|_i) (D,. 3 -)- Dg^^), 


i=0 


s=l r=l 


where 


Ds,t — ^ ^ ^ ^ ^i+h){£j ^j+h)- 




Since for any fi, E 


t-r ‘•s 

2 


“ ^j+h) = 2(h - l)( 7 o - 7 /^) + A*(h; 7 (.)), where 


h j+l 


^ (^)7(-)) 2 (27|j_j| 7p_j_/j| 7|j_j+/i|) , 

j=2 i=l 
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the result is established if we show that E[Dr,s] = E[Ds^r] = 0. To this end, observe that for 
any s E {1,..., — 2} and t E {s + 1,..., — 1}: 

ig 1 i‘p 1 

E[Ds,r] = [27|i_j| — j\j-i-h\ — 7|j-i+/i|] 

i=ts — h i=tr — h 


let X = tr — ts and recall that by assumption, mini<i<j^_i \ti — tj_i| > 4(m + 1) to get. 


h h 

= EE [27|a;+j-i| ^\x+j—i—h\ 7|3::+i-*+/i|] 0- 

i=l j=l 

The last equality follows because 7 |/i| = 0 for all h > m + 1. A similar argument shows that 
E[Dr,s] = 0. This completes the proof. □ 


From now on, ■= E]=v 

Lemma 11. Supppose that the conditions of Theorem^ hold. Let and be given 


by Eqs. (2.4)-(2.5). Then, 


ERhV) X ?"■'! = 


2P{d)nhnm 


[r + ir + IIP] Jk + s^^l{d) 


(A.23) 


where 


r = (m + l)(d^ + 1) [2n/j(7o - 7^) ELTk] 

m h m+1 h 

n* = 8{d^ - 1)14, ~ ^<h<m, 

s=0 t=l s=l t=l 

IIP = 2P{d) humlo- 

Here, S 2 ^l{d) = 0(n) and does not depend on Jk- 


Proof. By dehnition 


E|7"‘>(d) X ?'>)] 


E|/ + 11 + III] 

2-P(c?) 


where 

^ = Y1 “ ^j+hf] ’ 

III = Y1 - ^j+hf 

ij 

Since for all j, E[£j] = 0 and ^[{Sj 
to Eqs. (3.3)-(3.7) and get 


II 2 ^ ^ 5i{cL)pi{di) |^2yj(£'j + [Sj j 

hj 


Sj+hY] = 2(70 — 7 /i), we utilize the arguments leading 


P = E[J] = (m + + 1) [2 u/i(7o - 7h) + h Jk] Jk- 
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(A.24) 









Due to Gaussianity for any i and j, E[rji{d){ej — ejj^hY] = 0. Thus, according to Lemma 12 
r* = E[//] = 4 E[6,{d) ri,{d) Xjisj - 8,+^)] = 8{d^ - 1) Jk ^4- (A.25) 




Gaussianity and Lemma yield, '^ij ^[vf{d)Xj{£j - £j+h)] = 0, and E[r]f{d)] = 2P{d)xo, 
respectively. Gonsequently, 

r** = E[J/J] = 2P{d)jonmhJK + Sd^lid), (A.26) 

where by stationarity, = Yhij ~^j+h)‘^] = 0{n). In order to get Eq. (A.23) 

sum up Eqs. (A.24)-(A.25)-(A.26) and arrange terms. 

Lemma 12. Suppose that the conditions of Theorem\^hold. Let 

^K,d = hi{d) r]i{d)xj{8j - Sj+h)- 


□ 




Then, 


E[^KA = ^{d^-^)JKVh. 


(A.27) 


See Lemma [71] for a definition ofVh- 
Proof. Set Cm = 2(m + 1). Since for given j, 


diid) r]i{d) = d {aj - a^+i) x pi := E^^,(ci), 

^^jCm,,C77T,/2 

* i L 

6i{d)pi{d) = {aj - Oj+i) X Y Vi{d) ■= F^^.(d), 


.grc™/2,0 


CTn/2,0 


it follows that 6i{d)pi{d) = (E^^.((i) + F^^(ci)). 

Let Yhj YhiGp'^- Note that Xi = (®i ~ aj+i)]l./i,o(i) and this, in turn, implies 

that 'E^iXiAi - G+h) = - ai+i)(G - G+Zx) := Gh- Gonsequently, 

K-l 

(E.,(d) + F.^.(d)) X G;, = Ti + T2 + Ta + Gi + G2 + U,, 


j=0 


where 


Ti = 

T3 = 

U2 = 


d y~l ~ ®j+i)g Gh, T 2 — —d(l + d) y~^ {ttj — aj+i)ei^cm/2 Gh 


. j{cm,rnf2) 


j 

. j-{cm,m/2) 

’ L 

E 

(flj ajf_|_i)£'j_|_c^ Gh, U\ 

E 

(dj dj^i^Si G/i, 

• T{cm,ml2) 

Am/2,0) 

-(1 + d) 

'y ^ (®J ^j + lAi + CmP Gh, 

U3 = 

d y ^ iflj ®i + l)^* + Cm Gh 


Am/2,0) 

ti 


Am/2,0) 
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The result follows by computing the expected value of T’s and t/’s. Now we compute E[Ti] 
and E[17i], the remaining terms of E[Tx,d] can be treated similarly. In what follows, 


We begin by writing E[Ti] = ciE[Ti,i+Ti, 2 ], where Ti,i = Y.r,s 


Cm )Cm /2,/l,0 


aid U,, = Ety Et.yi(a. - a.+i) (ij - Oj+i) (E„ + E,, 


.i^-2 

^2=0 Z_-/ji=2-l- 

now that for any r,- and due to m-dependency, 


Note 




Cm 5^771 / 2 ,, 0 


Tj —(m+2) —1 2(m+l) h m-\-h h 

[7|r-s| — 7|r--(s+h)|] = ~ 

r=Tj — Cm s=Tn—h s=m +2 t=l s=m+l t=l 

(A.28) 


These calculations hold independently of the value of Tj, and consequently we get that 
E[Ti^i] = YlT=m+i'Yl^s=i'^\i'-s\JK- Similar calculations along with the m-dependency and 
( |2.8[ ) allows us to get that E[Ti^ 2 ] = 0. All in all, we have shown that 

m-\-h h 

E[Ti] = d E E 7|/i-(s+p| Jr- (A.29) 

5=m+l t=l 

Similar arguments yield, 

m+l h 

E[T2] = -d(l + d)'^Yl “ Tt-b+/*)|] Jk (A.30) 

S=1 t=l 

m h 

EIUI = d" ^ ^ [7«i - T|t-(«i)|l Jk- (A.31) 

t=l 

Now, we consider E[f/i]. Write Ui = Ui^i + f/ 1 ^ 2 , where 

K-l 

TT _ \ / \2 \ ^ 0^777/2, 0 ,/l .,0 

Ui^i - 2^(aj - a^+i) 2_^ Sj j 

j =0 r,s 

K -2 K-l 

Ui^2 = ~ fli+i) (% ~ ®i+i) 

2=0 j=i-\-l 



Following (A.28) we can show that for any Tj, E 


E 


s 

T,s 3^3 


C77i/2,0,/1,0 


E m+1 \-^h f 

s=i Et=i[7|t-.| 


7 |/i_(t+<j)|]. Again m-dependency and (2.8) allow us to show that E[f/i 2 ] = 0. Therefore, 

m+l h 

^[^ 1 ] ~ ^ 'y "I [ 77-^1 ~ 7|s-(r'+/i)|] Jr- (A.32) 


r=l s=l 
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Similar arguments yield, 


m+1 h 


E[f^2] (1 + d) ^ ^ ^ ^ [dr+s 7|/i-(r-+s)|] Jj 

r=0 s=l 
ra+h h 

^[^3] ~d ^ ^ ^ ^ 7|/i-(r+s)| JK' 


(A.33) 

(A.34) 


r=m+l s=l 
^m+1 \-^h 


o* \~^h v^^+1 \~^h v^^+1 \~^h \~^h 

Since 2^^^^ 7|/i-(r-+s)| — Z^r=l ^s=l 7|s-r|5 Z^r=l 7|s-(?’+/i)| — z^r=0 ^s=l 7^+55 

(A.27) follows after summing up ( |A.29 )-(A.34). □ 


Appendix B. Proofs and auxiliary results for Section 0] 

In this Appendix we will assume that the conditions of Theorem hold and use the 
notation introduced at the beginning of Appendices |A| and |A. 2 [ Also throughout this section 
Cm = 2 (m + 1 ) and the symbols Ki, K 2 , etc., will denote constants which do not depend on 
n. We will write - to denote Yl'i=j+v 

Lemma 13. Suppose that the conditions of Theorem^ hold. Then, for h = 0,... ,m, 


K„ 


ERhVi..™)] = 7k + 0(S„), Sk = I] 2 ^ 


1?? 


K„ 


n 


-2{aj+l/2) 


i=i 


i=i 


Proof. We begin with the case h = 0. Since 7 o™^(d) = (2((i^ + d + 1) rtm) ^ J27=iidi{d) + 


cf. Corollary [l 


Pi{d)Y, see Appendix A.2 for dehnition of dfd) and pfd), and E[r]f{d)] = 2{d? + d + 1 ) 70 , 


in order to analyze the asymptotic bias of ^^\d) it suffices to focus on 
2^- To this end we write df^d) = sl{i) + 2dsQ{i) Sc„,{i) + d^ sl^ii) and note that for 

given i there exists a unique Tj such that 

r(0,C77T./2) 

I L \^t ^ ^ J • — {Jj j \^l/! f ^ J ^ i I IIL I "E y / ^ ^) J-Lli L ^ J 

sopj = 


U-i>U 

(Cm/2,0) ’ 


t{i,j) -.= aj{i/n) — aj{{i + m + l)/n) for i 

u{i,j) := aj{i/n) — aj+i((i + m + l)/n) for i G 

and a similar characterization holds for Sc„j(-), see Eq. (A. 6 ) in Appendix A .2 for dehnition 
of and This implies that the dominat terms in ^^{d) are of the form 

in what follows we provide bounds for these terms. 

Recall that 'dj = — 0^+1 (tj+i) and by assumption there exists a number c > 0 such 

that I'djI > c for all j. Utilizing the Holder condition of / we get 

A„-l A„-l 

< f^l,m (B.l) 


t=l 


i&i: 


(0,Cm/2) 


i=i 
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as well as 


Kn — l Kn Kn Kn 

j) 1^2,m y] y 

j = l ■^j(cm/2,0) j = l j = l j = l 


Here = sup^ (m + 1)“^ < oo, 1 ^ 2 ,m = 2 Ki^m- Define the set of indices Ik^ ■■= {3 e 
{!,..., Kn} : I'djI > 1} and observe that 

-| 

< { E >’?+ E £ { E >’?+ E 7 ’^?} £ (!+'=■■) E 4 (B.2) 


i=i 


KKn j0Kn 


KIKu KKn 


i=i 


From (B.l) and (B.2) follows that Y}a=i ^K'^) = ^ ( J2j=i ^‘j + J2j=i )• Consequently, 




E|7i” V)1 = 70 + O (S„) , S„ = E ^ + E 


^-2(a,+l/2)_ 

j=l ■“ j=l 

For h > 1, hrstly note that by writing = (2(n — h))~^ + hi(0))^ can 

mimick the calculations above and get 

=7o-7/x + (^(-5n). (B.4) 

Then, by dehnition, = l^\d) — cf. (2.6) in the Introduction, and the result 

follows by adding Eqs. (IB.31) and (IB.41). □ 


Lemma 14. Suppose that the assumptions of Lemma \T^ hold. Then 

K„ 

VAR(7S™^(d)) = + 0(n-^) 


j=i 


the same result holds for \/AR{S^^^). Moreover, 


Kr,, 


K„ 


VAR(77>(<i)) = 0(VAR(7’">(rf)) + (E I’^ol/n)" + (E 

i=i i=i 


(B.5) 


(B,6) 


Proof. We write Jo^\d) = n ^'^b‘f{d), where bfd) = Sfd) + pfd), see Appendix 
for notation. It is easily seen that VAR(6^((i)) = 4:6f{d)\/AR{pi{d)) + \/AR{pf{d)). From 
Corollaryand Lemma]^ VAR( 7 i(d)) and \/AR{pf{d)) are uniformly bounded. This implies 
that the order of magnitude of J2i VAR(6^(d)) depends solely on dfid). From arguments 


A.2 


in the proof of Lemma 13 we get that 

Ttm f ^71 

E VAR(6?(<i)) = O ( E 7 + E 

i=i 


(B.7) 


i=l 


0 = 1 
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It can be shown that COV( 6 ^((i), 6 ^(h)) = 4:6i{d)Sj{d)E[rii{d)rij{d)] + 2Si{d)E[rii{d) rij{d)] + 
26j{d)E[rij{d) rif{d)] + CO\/{rj'f (d), rj'j (d)). Dne to m-dependency and stationarity of moments 
np to 4-th order we get for any i and j, E[r]i{d) rij{d)] = K 2 h 2 (|j ~ *1); (d)] = 

%h 3 (|j ~*l) coy{rjf{d),rf-{d)) = K 4 /i 4 (|j —i|), where /i 2 (-)) h 3 (‘) /i 4 (-) are fnnctions 

which depend only on sums of moments of second, third and fourth order of e, respectively. 
With the same arguments used in Lemma we can establish that Yhij did) 6 j{d)fX 2 {j — 
i) = Oin). Similar arguments allow us to get that 'Ylij did) fisij — i) = ^dd)) and 

E,, C 0 V(pf(d), 7 ^|(d)) = O(n). 

All in all, we have proven that 


C0V(6j(d), (d)) = O W + 0(n). 


(B.8) 




Since S^d) = So(^) + ds^^ii), see Appendix A .2 for notation, from the characterization of 
So(') and Scm(') given in Lemma 13 it follows that the order of magnitude of (B. 8 ) is driven 
by Ei Soil). We get, 

/ \ 


i^n-l 




2 = 1 


2=1 


dj,'i)+ 




jei- 


{cm/2,0) 


/ 


K„ 


O I 

0=1 


(B.9) 

The latter follows because of Holder condition on / and calculations leading to (B.2). Observe 
that Eq. (B.5) follows by a combination of Eqs. ( B.7[ )-(B.8)-(B.9). 

For h > 1, we write = (2(n — h))~^ Sr=i('®o(*) + Vii^)Y rnimick the calculations 
above to deduce that do"^\d) and have variances of the same order. Then, since 7 ^”^^ = 
7 q”*^ — cf. (2.6) in the Introduction, we combine Eq. (B.5) and Lemma 15 below to show 
the validity of Eq. (B. 6 ). This completes the proof. □ 


Lemma 15. Suppose that the assumptions of Lemma f_3 hold. Then 

COV(7'">(*,™).'5''‘') = O ((fi |f,l/")'+ (E "■'“■■"‘y ) + 0(n-‘). (B.IO) 

V j=i j=i J 

Proof. We begin with the case dh^m = 1- Throughout this proof k G {h, Cm}- By dehnition, 

1 




cov(7l,’">(i),!<'•>) = 


12 UmUh 




(B.ll) 


i=l j = l 


where for given index i, zi^k = yZ(^i+k+i)^k+ 2 yr.{i+k+i)-, see Appendix |A for notation of 
and Eq. (3.1) for dehnition of the (fc -|- 2) x (A: -|- 2) matrix Dk+ 2 - We also 
write Ciik) = fJ.j^^j^k+i) Dk+ 2 £v.{i+k+i) and dfk) = Dk+ 2 £v.{i+k+i) which by standard 

calculations yield CONizi^c^.Zj^h) = 4E[Q(cm) cj(h)] -h 2 E[ci(cm) dj(h)] -h 2 E[cj(h) di(cm)] - 


37 


























tr(Dc^_i _2 Sc^+ 2 ) tr(D/i +2 S/i+ 2 ). Stationarity and m-dependency, arguments also used in 


Lemma 14 allow us to get that the second, third and fourth summands above are sums 
of stationary moments of second, third and fourth order, respectively. Hence the contribu¬ 
tion of these terms to (B.ll) is of order It is not difficult to see that for given indeces 

i and j, Ci{m) Cj{h) is the sum of 8 terms of the form (/^ - fi+(^rn+i)){fj - fj+h){£i - + 

ei+ 2 (m+i)){,£ j —£j+h) and due to stationarity and m-dependency, E[cj(m) Cj{h)] is bounded by 


\fi - /i+(m+i))(/j - fj+h)\- Now, since Sk{i) = fi+k - fi+m+i, see notation in Appendix |A^ 
we utilize the ideas leading to the bound of VAR(7 q™^( 1)), cf. (B.5), and obtain that 


nh 


K„ 


-/>+/.)! = o (5^+ ( 


i=l j=l 




Kn 

E 

i=i 


n 


(B.12) 


Thus for dh,m = 1, the result follows by a combination of Eqs. (B.ll) and (B.12). For 
the other values of dh,m-i cf. (2.14) in the Introduction, we mimick the calculations above to 
complete the proof. □ 
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